{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Router QueryEngine and SubQuestion QueryEngine\n",
    "\n",
    "In this notebook we will demonstrate:\n",
    "\n",
    "1. **RouterQueryEngine** - Handle user queries to choose from predefined indices.\n",
    "2. **SubQuestionQueryEngine** - breaks down the complex query into sub-questions for each relevant data source, then gathers all the intermediate responses and synthesizes a final response.\n",
    "\n",
    "[RouterQueryEngine Documentation](https://docs.llamaindex.ai/en/stable/examples/query_engine/routerqueryengine/)\n",
    "\n",
    "[SubQuestionQueryEngine Documentation](https://docs.llamaindex.ai/en/stable/examples/query_engine/sub_question_query_engine/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Router QueryEngine\n",
    "\n",
    "Routers act as specialized modules that handle user queries and choose from a set of predefined options, each defined by specific metadata.\n",
    "\n",
    "There are two main types of core router modules:\n",
    "\n",
    "1. **LLM Selectors**: These modules present the available options as a text prompt and use the LLM text completion endpoint to make decisions.\n",
    "\n",
    "2. **Pydantic Selectors**: These modules format the options as Pydantic schemas and pass them to a function-calling endpoint, returning the results as Pydantic objects."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# NOTE: This is ONLY necessary in jupyter notebook.\n",
    "# Details: Jupyter runs an event-loop behind the scenes.\n",
    "#          This results in nested event-loops when we start an event-loop to make async queries.\n",
    "#          This is normally not allowed, we use nest_asyncio to allow it for convenience.\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NumExpr defaulting to 2 threads.\n"
     ]
    }
   ],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "# Set up the root logger\n",
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.INFO)  # Set logger level to INFO\n",
    "\n",
    "# Clear out any existing handlers\n",
    "logger.handlers = []\n",
    "\n",
    "# Set up the StreamHandler to output to sys.stdout (Colab's output)\n",
    "handler = logging.StreamHandler(sys.stdout)\n",
    "handler.setLevel(logging.INFO)  # Set handler level to INFO\n",
    "\n",
    "# Add the handler to the logger\n",
    "logger.addHandler(handler)\n",
    "\n",
    "from llama_index.core import (\n",
    "    VectorStoreIndex,\n",
    "    SummaryIndex,\n",
    "    SimpleDirectoryReader,\n",
    "    ServiceContext,\n",
    "    StorageContext,\n",
    ")\n",
    "\n",
    "import openai\n",
    "import os\n",
    "from IPython.display import display, HTML\n",
    "\n",
    "\n",
    "# Setup openai api key\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-05-16 05:27:42--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 75042 (73K) [text/plain]\n",
      "Saving to: ‘data/paul_graham/paul_graham_essay.txt’\n",
      "\n",
      "\r\n",
      "          data/paul   0%[                    ]       0  --.-KB/s               \r\n",
      "data/paul_graham/pa 100%[===================>]  73.28K  --.-KB/s    in 0.002s  \n",
      "\n",
      "2024-05-16 05:27:42 (46.3 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"data/paul_graham\").load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.text_splitter import SentenceSplitter\n",
    "\n",
    "# create parser and parse document into nodes\n",
    "parser = SentenceSplitter(chunk_size=1024, chunk_overlap=100)\n",
    "nodes = parser(documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create VectorStoreIndex and SummaryIndex."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
     ]
    }
   ],
   "source": [
    "# Summary Index for summarization questions\n",
    "summary_index = SummaryIndex(nodes)\n",
    "\n",
    "# Vector Index for answering specific context questions\n",
    "vector_index = VectorStoreIndex(nodes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Query Engines.\n",
    "\n",
    "1. Summary Index Query Engine.\n",
    "2. Vector Index Query Engine."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Summary Index Query Engine\n",
    "summary_query_engine = summary_index.as_query_engine(\n",
    "    response_mode=\"tree_summarize\",\n",
    "    use_async=True,\n",
    ")\n",
    "\n",
    "# Vector Index Query Engine\n",
    "vector_query_engine = vector_index.as_query_engine()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build summary index and vector index tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.tools.query_engine import QueryEngineTool, ToolMetadata\n",
    "\n",
    "# Summary Index tool\n",
    "summary_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=summary_query_engine,\n",
    "    description=\"Useful for summarization questions related to Paul Graham eassy on What I Worked On.\",\n",
    ")\n",
    "\n",
    "# Vector Index tool\n",
    "vector_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=vector_query_engine,\n",
    "    description=\"Useful for retrieving specific context from Paul Graham essay on What I Worked On.\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Router Query Engine\n",
    "\n",
    "Various selectors are at your disposal, each offering unique characteristics.\n",
    "\n",
    "Pydantic selectors, supported exclusively by gpt-4 and the default gpt-3.5-turbo, utilize the OpenAI Function Call API. Instead of interpreting raw JSON, they yield pydantic selection objects.\n",
    "\n",
    "On the other hand, LLM selectors employ the LLM to generate a JSON output, which is then parsed to query the relevant indexes.\n",
    "\n",
    "For both selector types, you can opt to route to either a single index or multiple indexes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PydanticSingleSelector\n",
    "\n",
    "Use the OpenAI Function API to generate/parse pydantic objects under the hood for the router selector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.query_engine.router_query_engine import RouterQueryEngine\n",
    "from llama_index.core.selectors.llm_selectors import (\n",
    "    LLMSingleSelector,\n",
    "    LLMMultiSelector,\n",
    ")\n",
    "from llama_index.core.selectors.pydantic_selectors import (\n",
    "    PydanticMultiSelector,\n",
    "    PydanticSingleSelector,\n",
    ")\n",
    "\n",
    "# Create Router Query Engine\n",
    "query_engine = RouterQueryEngine(\n",
    "    selector=PydanticSingleSelector.from_defaults(),\n",
    "    query_engine_tools=[\n",
    "        summary_tool,\n",
    "        vector_tool,\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "Selecting query engine 0: The choice is specifically related to summarization questions about Paul Graham's essay on What I Worked On..\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What is the summary of the document?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style=\"font-size:20px\">The document chronicles Paul Graham's journey through various endeavors, including his experiences with writing, programming, and founding software companies like Viaweb and Y Combinator. It discusses his exploration of painting, personal challenges such as his mother's illness, and his decision to step back from Y Combinator to focus on painting before returning to Lisp programming with the development of a new dialect called Bel. The narrative also covers Graham's reflections on his work choices, the transition of Y Combinator's leadership to Sam Altman, and his contemplation on future projects and the impact of customs in evolving fields.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### LLMSingleSelector\n",
    "\n",
    "Utilize OpenAI (or another LLM) to internally interpret the generated JSON and determine a sub-index for routing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create Router Query Engine\n",
    "query_engine = RouterQueryEngine(\n",
    "    selector=LLMSingleSelector.from_defaults(),\n",
    "    query_engine_tools=[\n",
    "        summary_tool,\n",
    "        vector_tool,\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "Selecting query engine 0: This choice indicates that the summary is related to summarization questions specifically about Paul Graham's essay on What I Worked On..\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What is the summary of the document?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style=\"font-size:20px\">The document chronicles Paul Graham's journey through various projects and endeavors, from his early experiences with writing and programming to his involvement in building software companies like Viaweb and Y Combinator. It details his exploration of different projects, challenges faced, decisions made, and eventual transition to focusing on painting and writing essays. The narrative also discusses his experimentation with the Lisp programming language and the development of a new Lisp dialect called Bel. The document concludes with Graham reflecting on his past projects and contemplating his future endeavors, emphasizing the importance of pursuing projects aligned with personal goals and interests.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "Selecting query engine 1: This choice is more relevant as it focuses on retrieving specific context from Paul Graham's essay on What I Worked On, which would likely provide information on what he did after RICS..\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What did Paul Graham do after RICS?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style=\"font-size:20px\">Paul Graham started painting after RICS.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PydanticMultiSelector\n",
    "\n",
    "If you anticipate queries being directed to multiple indexes, it's advisable to use a multi-selector. This selector dispatches the query to various sub-indexes and subsequently aggregates the responses through a summary index to deliver a comprehensive answer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's create a simplekeywordtable index and corresponding tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleKeywordTableIndex\n",
    "\n",
    "keyword_index = SimpleKeywordTableIndex(nodes)\n",
    "\n",
    "keyword_query_engine = keyword_index.as_query_engine()\n",
    "\n",
    "keyword_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=keyword_query_engine,\n",
    "    description=\"Useful for retrieving specific context using keywords from Paul Graham essay on What I Worked On.\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build a router query engine."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = RouterQueryEngine(\n",
    "    selector=PydanticMultiSelector.from_defaults(),\n",
    "    query_engine_tools=[vector_tool, keyword_tool, summary_tool],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "Selecting query engine 0: Retrieving specific context from Paul Graham essay on What I Worked On can provide detailed information about noteable events and people from the author's time at Interleaf and YC..\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "Selecting query engine 1: Retrieving specific context using keywords from Paul Graham essay on What I Worked On can help identify key events and people related to the author's time at Interleaf and YC..\n",
      "> Starting query: What were noteable events and people from the authors time at Interleaf and YC?\n",
      "query keywords: ['noteable', 'authors', 'time', 'interleaf', 'yc', 'events', 'people']\n",
      "> Extracted keywords: ['time', 'interleaf', 'yc', 'people']\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "Combining responses from multiple query engines.\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
     ]
    }
   ],
   "source": [
    "# This query could use either a keyword or vector query engine, so it will combine responses from both\n",
    "response = query_engine.query(\n",
    "    \"What were noteable events and people from the authors time at Interleaf and YC?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style=\"font-size:20px\">Notable events from the author's time at Interleaf include working on software development, observing inefficiencies in managing versions and ports, and learning about the dynamics of technology companies. Notable people mentioned from this time include Robert and Trevor, with whom the author worked on developing software components like the editor, shopping cart, and manager. \n",
       "\n",
       "During the author's time at Y Combinator (YC), notable events include working on various projects such as Hacker News, transitioning leadership to Sam Altman, and shifting focus towards painting after leaving YC. Notable people mentioned from this time include Julian, who provided seed funding for Viaweb, and Robert Morris, who advised the author about not letting YC be the last significant endeavor.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SubQuestion Query Engine\n",
    "\n",
    "Here, we will demonstrate how to use a sub-question query engine to address the challenge of answering a complex query using multiple data sources.\n",
    "\n",
    "The SubQuestion Query Engine first breaks down the complex query into sub-questions for each relevant data source, then gathers all the intermediate responses and synthesizes a final response."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-05-16 05:36:06--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 1880483 (1.8M) [application/octet-stream]\n",
      "Saving to: ‘data/10k/uber_2021.pdf’\n",
      "\n",
      "\r\n",
      "data/10k/uber_2021.   0%[                    ]       0  --.-KB/s               \r\n",
      "data/10k/uber_2021. 100%[===================>]   1.79M  --.-KB/s    in 0.01s   \n",
      "\n",
      "2024-05-16 05:36:06 (184 MB/s) - ‘data/10k/uber_2021.pdf’ saved [1880483/1880483]\n",
      "\n",
      "--2024-05-16 05:36:06--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 1440303 (1.4M) [application/octet-stream]\n",
      "Saving to: ‘data/10k/lyft_2021.pdf’\n",
      "\n",
      "data/10k/lyft_2021. 100%[===================>]   1.37M  --.-KB/s    in 0.01s   \n",
      "\n",
      "2024-05-16 05:36:06 (120 MB/s) - ‘data/10k/lyft_2021.pdf’ saved [1440303/1440303]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p 'data/10k/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lyft_docs = SimpleDirectoryReader(\n",
    "    input_files=[\"./data/10k/lyft_2021.pdf\"]\n",
    ").load_data()\n",
    "uber_docs = SimpleDirectoryReader(\n",
    "    input_files=[\"./data/10k/uber_2021.pdf\"]\n",
    ").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded lyft 10-K with 238 pages\n",
      "Loaded Uber 10-K with 307 pages\n"
     ]
    }
   ],
   "source": [
    "print(f\"Loaded lyft 10-K with {len(lyft_docs)} pages\")\n",
    "print(f\"Loaded Uber 10-K with {len(uber_docs)} pages\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
     ]
    }
   ],
   "source": [
    "lyft_index = VectorStoreIndex.from_documents(lyft_docs)\n",
    "uber_index = VectorStoreIndex.from_documents(uber_docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Query Engines"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "uber_engine = uber_index.as_query_engine(similarity_top_k=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
     ]
    }
   ],
   "source": [
    "response = await lyft_engine.aquery(\n",
    "    \"What is the revenue of Lyft in 2021? Answer in millions with page reference\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style=\"font-size:20px\">The revenue of Lyft in 2021 was $3.208 billion. (Page reference: 79)</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
     ]
    }
   ],
   "source": [
    "response = await uber_engine.aquery(\n",
    "    \"What is the revenue of Uber in 2021? Answer in millions, with page reference\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style=\"font-size:20px\">The revenue of Uber in 2021 was $17,455 million. (Reference: page 77)</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define QueryEngine Tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine_tools = [\n",
    "    QueryEngineTool(\n",
    "        query_engine=lyft_engine,\n",
    "        metadata=ToolMetadata(\n",
    "            name=\"lyft_10k\",\n",
    "            description=\"Provides information about Lyft financials for year 2021\",\n",
    "        ),\n",
    "    ),\n",
    "    QueryEngineTool(\n",
    "        query_engine=uber_engine,\n",
    "        metadata=ToolMetadata(\n",
    "            name=\"uber_10k\",\n",
    "            description=\"Provides information about Uber financials for year 2021\",\n",
    "        ),\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SubQuestion QueryEngine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.query_engine.sub_question_query_engine import (\n",
    "    SubQuestionQueryEngine,\n",
    ")\n",
    "\n",
    "sub_question_query_engine = SubQuestionQueryEngine.from_defaults(\n",
    "    query_engine_tools=query_engine_tools\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Querying"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "Generated 4 sub questions.\n",
      "\u001b[1;3;38;2;237;90;200m[uber_10k] Q: What was the revenue of Uber in 2020?\n",
      "\u001b[0m\u001b[1;3;38;2;90;149;237m[uber_10k] Q: What was the revenue of Uber in 2021?\n",
      "\u001b[0m\u001b[1;3;38;2;11;159;203m[lyft_10k] Q: What was the revenue of Lyft in 2020?\n",
      "\u001b[0m\u001b[1;3;38;2;155;135;227m[lyft_10k] Q: What was the revenue of Lyft in 2021?\n",
      "\u001b[0mHTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "\u001b[1;3;38;2;155;135;227m[lyft_10k] A: $3,208,323\n",
      "\u001b[0mHTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "\u001b[1;3;38;2;11;159;203m[lyft_10k] A: $2,364,681\n",
      "\u001b[0mHTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "\u001b[1;3;38;2;237;90;200m[uber_10k] A: $11,139 million\n",
      "\u001b[0mHTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "\u001b[1;3;38;2;90;149;237m[uber_10k] A: $17,455\n",
      "\u001b[0mHTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
     ]
    }
   ],
   "source": [
    "response = await sub_question_query_engine.aquery(\n",
    "    \"Compare revenue growth of Uber and Lyft from 2020 to 2021\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style=\"font-size:20px\">Lyft's revenue grew by approximately 35.4% from 2020 to 2021, while Uber's revenue increased by around 56.8% during the same period.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML(f'<p style=\"font-size:20px\">{response.response}</p>'))"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
